-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
follow up librime#806, avoid path-string conversion #310
Conversation
src/types_ext.cc
Outdated
@@ -263,6 +263,10 @@ namespace UserDbReg{ | |||
return {}; | |||
} | |||
|
|||
string file_name(const T& t) { | |||
return t.file_path().u8string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that file_name(db)
should return UTF-8 encoded path (instead of CP_ACP
on Windows)?
UTF-8 string is appropriate if the returned string path is used to extract schema ID, or passed to librime code as path.
The returned string path might need to be converted to ANSI encoding before given to lua file-system functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, lua uses ANSI encoding path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's make it ANSI encoded.
Note this is the original build error because the C++ class no longer has the getter file_name
.
A reminder from the summary of the refactor in librime:
file_name
's output, ANSI encoded path, is suitable for accessing the file system using Lua binding to system functions.
If any wants to get schema ID, dictionary ID from the file name, it has to be first converted to UTF-8 string, in order to support path with Chinese characters on Windows.
When constructing a path from schema ID or UTF-8 encoded string read from config, conversion is also needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ANSI encoded path is subtle, but this is lua's limitation. We need a custom source code of lua if we want UTF-8 encoded path.
These changes are in my commit. Anything else? |
src/types.cc
Outdated
@@ -308,7 +308,8 @@ namespace ReverseDbReg { | |||
using T = ReverseDb; | |||
|
|||
an<T> make(const string &file) { | |||
an<T> db = New<ReverseDb>(string(RimeGetUserDataDir()) + "/" + file); | |||
an<T> db = New<ReverseDb>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the only necessary change to pass the compilation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the only one.
I got more errors after resolving the first. I'm commenting inline for every change.
src/modules.cc
Outdated
@@ -17,8 +17,11 @@ static bool file_exists(const char *fname) noexcept { | |||
} | |||
|
|||
static void lua_init(lua_State *L) { | |||
const auto user_dir = std::string(RimeGetUserDataDir()); | |||
const auto shared_dir = std::string(RimeGetSharedDataDir()); | |||
// path::string() returns native encoding on Windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RimeGetUserDataDir
, RimeGetSharedDataDir
are deprecated.
They still work, but less efficient. Because the internal data has changed from string
to path
, they will do extra conversion and allocating string in static variable.
src/opencc.cc
Outdated
opencc::Config config; | ||
converter_ = config.NewFromFile(config_path); | ||
// OpenCC accepts UTF-8 encoded path. | ||
converter_ = config.NewFromFile(config_path.u8string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change fixes the wrong path encoding given to opencc on Windows.
Opencc does it UTF-8 to ANSI string conversion internally.
} | ||
|
||
string get_shared_data_dir() { | ||
RimeApi* rime = rime_get_api(); | ||
return string(rime->get_shared_data_dir()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, RimeApi::get_shared_data_dir
is deprecated because its inefficiency. Internally it does the same as what the line has changed into, plus assigning the string to a static local variable, then return its c_str()
. The C-string is constructed back into another string
object in the original code. This string allocation is unnecessary but was needed simply because the original librime API was written in C.
Avoid using `RimeGet*Dir()` from `rime_api`. They return native path on Windows and coding conversion is incurred. When passing path from lua to librime class, use either rime::path object or UTF-8 encoded string.
Changed |
LGTM. Added some code to ensure it can build with librime <= 1.9.0 |
Avoid using
RimeGet*Dir()
fromrime_api
. They return native path on Windows and coding conversion is incurred.When passing path from lua to librime class, use either rime::path object or UTF-8 encoded string.